In this paper, we propose a novel meta learning approach for automatic channel pruning of very deep neural networks. We first train a PruningNet, a kind of meta network, which is able to generate weight parameters for any pruned structure given the target network. We use a simple stochastic structure sampling method for training the PruningNet. Then, we apply an evolutionary procedure to search for good-performing pruned networks. The search is highly efficient because the weights are directly generated by the trained PruningNet and we do not need any finetuning at search time. With a single PruningNet trained for the target network, we can search for various Pruned Networks under different constraints with little human participation. Compared to the state-of-the-art pruning methods, we have demonstrated superior performances on Mo-bileNet V1/V2 and ResNet. Codes are available on https: //github.com/liuzechun/MetaPruning. This work is done when Zechun Liu and Haoyuan Mu are interns at Megvii Technology.
translated by 谷歌翻译
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e.g., spatial in images, temporal in audio, and syntactic in language. In this paper, we explore the orthogonal channel dimension for generic data augmentation. The data for each channel is quantized through a non-uniform quantizer, with the quantized value sampled randomly within randomly sampled quantization bins. From another perspective, quantization is analogous to channel-wise masking, as it removes the information within each bin, but preserves the information across bins. We apply the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models. This generic approach achieves results on par with modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive DABS benchmark which is comprised of various data modalities. Code is availabel at http://www.github.com/microsoft/random_quantize.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Left-ventricular ejection fraction (LVEF) is an important indicator of heart failure. Existing methods for LVEF estimation from video require large amounts of annotated data to achieve high performance, e.g. using 10,030 labeled echocardiogram videos to achieve mean absolute error (MAE) of 4.10. Labeling these videos is time-consuming however and limits potential downstream applications to other heart diseases. This paper presents the first semi-supervised approach for LVEF prediction. Unlike general video prediction tasks, LVEF prediction is specifically related to changes in the left ventricle (LV) in echocardiogram videos. By incorporating knowledge learned from predicting LV segmentations into LVEF regression, we can provide additional context to the model for better predictions. To this end, we propose a novel Cyclical Self-Supervision (CSS) method for learning video-based LV segmentation, which is motivated by the observation that the heartbeat is a cyclical process with temporal repetition. Prediction masks from our segmentation model can then be used as additional input for LVEF regression to provide spatial context for the LV region. We also introduce teacher-student distillation to distill the information from LV segmentation masks into an end-to-end LVEF regression model that only requires video inputs. Results show our method outperforms alternative semi-supervised methods and can achieve MAE of 4.17, which is competitive with state-of-the-art supervised performance, using half the number of labels. Validation on an external dataset also shows improved generalization ability from using our method. Our code is available at https://github.com/xmed-lab/CSS-SemiVideo.
translated by 谷歌翻译
从经验上证明,捕获长期依赖性在各种计算机视觉任务上具有有效性。通过多头注意机制的帮助,通过使用变压器框架来实现这一主题的进步。但是,基于注意力的图像贴片相互作用可能遭受阶级内斑块的冗余相互作用和阶层间斑块的无方向相互作用的问题。在本文中,我们提出了一个新颖的图形推理变压器(Great),用于解析图像,以使图像贴片能够按照关系推理模式进行交互。具体而言,线性嵌入式图像贴片首先投影到图形空间中,其中每个节点代表一组图像贴片的隐式视觉中心,每个边缘都反映了两个相邻节点之间的关系权重。之后,全局关系推理相应地在此图上执行。最后,包括关系信息在内的所有节点都映射回原始空间以进行后续过程。与常规变压器相比,GREAT具有更高的交互效率和更有目的的交互模式。实验是在具有挑战性的城市景观和ADE20K数据集上进行的。结果表明,在最先进的变压器基线上,具有轻微的计算开销,可以实现一致的性能增长。
translated by 谷歌翻译
考虑到来源提供准确的信息(可信度)的可能性,加权多数投票(WMV)是集体决策制定的众所周知的最佳决策规则。但是,实际上,可信度不是对决策者的已知数量 - 他们必须依靠称为信托的估计。计算信任的(机器学习)算法在具有系统上不高估或低估可信度的属性时称为无偏见。为了正式分析决策过程的不确定性,我们介绍和分析了这种无偏信任值的两个重要特性:正确性的稳定性和最优性的稳定性。正确性的稳定性意味着决策者认为自己所获得的决策准确性等于实际准确性。我们证明了正确性的稳定性。最优性的稳定性意味着基于信任做出的决策与基于可信赖性的情况一样好。最佳稳定性不达到。我们分析了两者之间的差异及其界限。我们还概述了对信任和可信度变化的敏感决策正确性的敏感性。
translated by 谷歌翻译
本文研究了几种皮肤疾病分类问题。基于至关重要的观察,即皮肤病图像通常存在于一类中的多个子群体(即,一类疾病中图像的外观变化并形成多个不同的子组),我们设计了一种新型的亚群集感知网络,即扫描,以提高准确性以稀有皮肤疾病诊断。由于几次学习的性能很大程度上取决于学习特征编码器的质量,因此指导扫描设计的主要原理是每个类的内在子簇表示学习,以便更好地描述特征分布。具体而言,扫描遵循双分支框架,第一个分支是学习范围的特征以区分不同的皮肤疾病,第二个分支是学习可以有效地将每个班级划分为几个组的特征,以保留子 - 每个类中的聚集结构。为了实现第二个分支的目标,我们提出了一个集群损失,可以通过无监督的聚类学习图像相似性。为了确保每个子集群中的样品来自同一类,我们进一步设计了纯度损失,以完善无监督的聚类结果。我们在两个公共数据集上评估了拟议方法,以进行几次皮肤疾病分类。实验结果验证了我们的框架在SD-198和DERM7PT数据集​​上优于其他最先进方法约为2%至4%。
translated by 谷歌翻译
从组织学图像中分割的核分割是数字病理分析中的基本任务。但是,基于深度学习的核分割方法通常会受到有限的注释。本文提出了一种现实的数据扩增方法,用于核分割,名为Insmix,该方法遵循复制 - 平滑原理,并执行形态约束的生成实例增强。具体而言,我们提出了形态约束,使增强图像能够在保持其形态特征(例如几何和位置)的同时获取有关核的大量信息。为了充分利用背景的像素冗余并改善模型的鲁棒性,我们进一步提出了一种背景扰动方法,该方法随机地随机地洗牌,而不会使原始核分布失调。为了实现原始和模板实例之间的上下文一致性,平滑gan的设计具有前景相似性编码器(FSE)和三胞胎损失。我们在两个数据集(即Kumar和CPS数据集)上验证了所提出的方法。实验结果证明了每个组件的有效性以及我们方法与最新方法相比的卓越性能。
translated by 谷歌翻译
视觉变压器最近由于其在各种计算机视觉任务上的出色表现而引发了医学图像分析领域的新浪潮。但是,最近的基于混合/变压器的方法主要集中于变形金刚在捕获长期依赖性方面的好处,同时忽略了其艰巨的计算复杂性,高培训成本和冗余依赖性的问题。在本文中,我们建议对变形金刚进行自适应修剪进行医学图像分割,并提出轻巧有效的混合网络表达式。据我们所知,这是针对医学图像分析任务修剪变压器修剪的第一项工作。 Apformer的关键特征主要是自我监督的自我注意力(SSA),以改善依赖性建立的收敛性,高斯 - 优先相对位置嵌入(GRPE),以促进学习位置信息的学习,并自适应修剪以消除冗余计算和感知信息。具体而言,SSA和GRPE分别考虑了良好的依赖分布和高斯热图分布,作为自我注意事项和嵌入位置的先验知识,以减轻变压器的训练并为以下修剪操作奠定坚实的基础。然后,通过调整栅极控制参数以降低复杂性和性能改进来执行自适应变压器修剪,无论是查询和依赖性方面的修剪,都可以执行。在两个广泛使用的数据集上进行了广泛的实验,证明了Apformer对具有更少参数和较低GFLOPS的最新方法的显着分割性能。更重要的是,通过消融研究,我们证明了自适应修剪可以作为插头-N-play模块,以改善其他基于混合的混合/变压器方法。代码可从https://github.com/xianlin7/apformer获得。
translated by 谷歌翻译